41 research outputs found

    Complete gene expression profiling of Saccharopolyspora erythraea using GeneChip DNA microarrays

    Get PDF
    The Saccharopolyspora erythraea genome sequence, recently published, presents considerable divergence from those of streptomycetes in gene organization and function, confirming the remarkable potential of S. erythraea for producing many other secondary metabolites in addition to erythromycin. In order to investigate, at whole transcriptome level, how S. erythraea genes are modulated, a DNA microarray was specifically designed and constructed on the S. erythraea strain NRRL 2338 genome sequence, and the expression profiles of 6494 ORFs were monitored during growth in complex liquid medium

    Complete genome sequence of a serotype 11A, ST62 Streptococcus pneumoniae invasive isolate

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Streptococcus pneumoniae </it>is an important human pathogen representing a major cause of morbidity and mortality worldwide. We sequenced the genome of a serotype 11A, ST62 <it>S. pneumoniae </it>invasive isolate (AP200), that was erythromycin-resistant due to the presence of the <it>erm</it>(TR) determinant, and carried out analysis of the genome organization and comparison with other pneumococcal genomes.</p> <p>Results</p> <p>The genome sequence of <it>S. pneumoniae </it>AP200 is 2,130,580 base pair in length. The genome carries 2216 coding sequences (CDS), 56 tRNA, and 12 rRNA genes. Of the CDSs, 72.9% have a predicted biological known function. AP200 contains the pilus islet 2 and, although its phenotype corresponds to serotype 11A, it contains an 11D capsular locus. Chromosomal rearrangements resulting from a large inversion across the replication axis, and horizontal gene transfer events were observed. The chromosomal inversion is likely implicated in the rebalance of the chromosomal architecture affected by the insertions of two large exogenous elements, the <it>erm</it>(TR)-carrying Tn<it>1806 </it>and a functional prophage designated Ď•Spn_200. Tn<it>1806 </it>is 52,457 bp in size and comprises 49 ORFs. Comparative analysis of Tn<it>1806 </it>revealed the presence of a similar genetic element or part of it in related species such as <it>Streptococcus pyogenes </it>and also in the anaerobic species <it>Finegoldia magna, Anaerococcus prevotii </it>and <it>Clostridium difficile</it>. The genome of Ď•Spn_200 is 35,989 bp in size and is organized in 47 ORFs grouped into five functional modules. Prophages similar to Ď•Spn_200 were found in pneumococci and in other streptococcal species, showing a high degree of exchange of functional modules. Ď•Spn_200 viral particles have morphologic characteristics typical of the <it>Siphoviridae </it>family and are capable of infecting a pneumococcal recipient strain.</p> <p>Conclusions</p> <p>The sequence of <it>S. pneumoniae </it>AP200 chromosome revealed a dynamic genome, characterized by chromosomal rearrangements and horizontal gene transfers. The overall diversity of AP200 is driven mainly by the presence of the exogenous elements Tn<it>1806 </it>and Ď•Spn_200 that show large gene exchanges with other genetic elements of different bacterial species. These genetic elements likely provide AP200 with additional genes, such as those conferring antibiotic-resistance, promoting its adaptation to the environment.</p

    Evaluation of human gene variant detection in amplicon pools by the GS-FLX parallel Pyrosequencer

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A new priority in genome research is large-scale resequencing of genes to understand the molecular basis of hereditary disease and cancer. We assessed the ability of massively parallel pyrosequencing to identify sequence variants in pools. From a large collection of human PCR samples we selected 343 PCR products belonging to 16 disease genes and including a large spectrum of sequence variations previously identified by Sanger sequencing. The sequence variants included SNPs and small deletions and insertions (up to 44 bp), in homozygous or heterozygous state.</p> <p>Results</p> <p>The DNA was combined in 4 pools containing from 27 to 164 amplicons and from 8,9 to 50,8 Kb to sequence for a total of 110 Kb. Pyrosequencing generated over 80 million base pairs of data. Blind searching for sequence variations with a specifically designed bioinformatics procedure identified 465 putative sequence variants, including 412 true variants, 53 false positives (in or adjacent to homopolymeric tracts), no false negatives. All known variants in positions covered with at least 30Ă— depth were correctly recognized.</p> <p>Conclusion</p> <p>Massively parallel pyrosequencing may be used to simplify and speed the search for DNA variations in PCR products. Our results encourage further studies to evaluate molecular diagnostics applications.</p

    A transcriptional sketch of a primary human breast cancer by 454 deep sequencing

    Get PDF
    Background: The cancer transcriptome is difficult to explore due to the heterogeneity of quantitative and qualitative changes in gene expression linked to the disease status. An increasing number of "unconventional" transcripts, such as novel isoforms, non-coding RNAs, somatic gene fusions and deletions have been associated with the tumoral state. Massively parallel sequencing techniques provide a framework for exploring the transcriptional complexity inherent to cancer with a limited laboratory and financial effort. We developed a deep sequencing and bioinformatics analysis protocol to investigate the molecular composition of a breast cancer poly(A)+ transcriptome. This method utilizes a cDNA library normalization step to diminish the representation of highly expressed transcripts and biology-oriented bioinformatic analyses to facilitate detection of rare and novel transcripts. Results: We analyzed over 132,000 Roche 454 high-confidence deep sequencing reads from a primary human lobular breast cancer tissue specimen, and detected a range of unusual transcriptional events that were subsequently validated by RT-PCR in additional eight primary human breast cancer samples. We identified and validated one deletion, two novel ncRNAs (one intergenic and one intragenic), ten previously unknown or rare transcript isoforms and a novel gene fusion specific to a single primary tissue sample. We also explored the non-protein-coding portion of the breast cancer transcriptome, identifying thousands of novel non-coding transcripts and more than three hundred reads corresponding to the non-coding RNA MALAT1, which is highly expressed in many human carcinomas. Conclusion: Our results demonstrate that combining 454 deep sequencing with a normalization step and careful bioinformatic analysis facilitates the discovery and quantification of rare transcripts or ncRNAs, and can be used as a qualitative tool to characterize transcriptome complexity, revealing many hitherto unknown transcripts, splice isoforms, gene fusion events and ncRNAs, even at a relatively low sequence sampling

    Community-driven development for computational biology at Sprints, Hackathons and Codefests

    Get PDF
    Background: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. Results: This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled "unconferences" (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. Conclusions: Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects

    Characterization of Nucleotide Misincorporation Patterns in the Iceman's Mitochondrial DNA

    Get PDF
    BACKGROUND: The degradation of DNA represents one of the main issues in the genetic analysis of archeological specimens. In the recent years, a particular kind of post-mortem DNA modification giving rise to nucleotide misincorporation ("miscoding lesions") has been the object of extensive investigations. METHODOLOGY/PRINCIPAL FINDINGS: To improve our knowledge regarding the nature and incidence of ancient DNA nucleotide misincorporations, we have utilized 6,859 (629,975 bp) mitochondrial (mt) DNA sequences obtained from the 5,350-5,100-years-old, freeze-desiccated human mummy popularly known as the Tyrolean Iceman or Otzi. To generate the sequences, we have applied a mixed PCR/pyrosequencing procedure allowing one to obtain a particularly high sequence coverage. As a control, we have produced further 8,982 (805,155 bp) mtDNA sequences from a contemporary specimen using the same system and starting from the same template copy number of the ancient sample. From the analysis of the nucleotide misincorporation rate in ancient, modern, and putative contaminant sequences, we observed that the rate of misincorporation is significantly lower in modern and putative contaminant sequence datasets than in ancient sequences. In contrast, type 2 transitions represent the vast majority (85%) of the observed nucleotide misincorporations in ancient sequences. CONCLUSIONS/SIGNIFICANCE: This study provides a further contribution to the knowledge of nucleotide misincorporation patterns in DNA sequences obtained from freeze-preserved archeological specimens. In the Iceman system, ancient sequences can be clearly distinguished from contaminants on the basis of nucleotide misincorporation rates. This observation confirms a previous identification of the ancient mummy sequences made on a purely phylogenetical basis. The present investigation provides further indication that the majority of ancient DNA damage is reflected by type 2 (cytosine-->thymine/guanine-->adenine) transitions and that type 1 transitions are essentially PCR artifacts

    Computational pan-genomics: status, promises and challenges

    Get PDF
    International audienceMany disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains

    The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009.</p> <p>Results</p> <p>Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs.</p> <p>Conclusions</p> <p>Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework.</p
    corecore